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Abstract 

We present a linear time and space algorithm computing the leftmost critical factorization of a given string 
on an unordered alphabet. 
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1. Introduction 

Stringology and combinatorics on words are closely related fields that intensively interact with each 
other. One of the most famous examples of their interaction is the surprising application of the so-called 
critical factorization, a notion that was created inside the field of combinatorics on words for purely theoretic 
reasons (the precise definition is presented below). Critical factorizations are at the core of the constant 
space string matching algorithm by Crochemore and Perrin [3] and its real time variation by Breslauer, 
Grossi, and Mignosi which are, perhaps, the most elegant and simple string matching algorithms with 
such time and space bounds. 

It is known that a critical factorization can be found in linear time and constant space when the input 
string is drawn from an ordered alphabet, i.e., when the alphabet is totally ordered and we can use symbol 
comparisons that test for the relative order of symbols (see mm)- In [T] it was posed as an open problem 
whether it is possible to find in linear time a critical factorization of a given string over an arbitrary 
unordered alphabet, i.e., when our algorithm is allowed to perform only equality comparisons. In this paper 
we answer this question affirmatively; namely, we describe a linear time algorithm finding the leftmost critical 
factorization of a given string on an unordered alphabet. A similar result is known for unbordered conjugates, 
a concept related to the critical factorizations: Duval et al. [B] proposed a linear algorithm that allows to 
find an unbordered conjugate of a given string on an arbitrary unordered alphabet. It is worth noting that 
all known so far algorithms working on general alphabets could find only some critical factorization while 
our algorithm always finds the leftmost one. However, for the case of integer alphabet, there is a linear 
algorithm finding the leftmost critical factorization but it uses some structures (namely, the Lempel-Ziv 
decomposition) that cannot be computed in linear time on a general (even ordered) alphabet [lOj . 

The paper is organized as follows. Section contains some basic definitions and facts used throughout 
the text. In Section we present our first algorithm and prove that its running time is 0(nlogn|^in 
Section where n is the length of the input string. A more detailed analysis of this algorithm is given in 
Section In Sectionwe improve our hrst solution to obtain a linear algorithm. Finally, we conclude with 
some remarks in Section [T] 

2. Preliminaries 

We need the following basic dehnitions. A string w over an alphabet E is a map {1,2,... ,n} i—S, 
where n is referred to as the length of w, denoted by 1^1. We write w[i\ for the fth letter of w and w[i..j] for 
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w[i]w[i+l] • • • w[j]. Let w[i..j] be the empty string for any i > j. A string u is a substring (or a factor) of w 
if u = w[i..j] for some i and j. The pair {i,j) is not necessarily unique; we say that i specihes an occurrence 
of u in w. A string can have many occurrences in another string. A substring [respectively, r(;[i..n]] 

is a prefix [respectively, suffix] of w. For integers i and j, the set {k G Z: i < k < j} (possibly empty) is 
denoted by [i-.j]- Denote [i..j) = [i..j—1], {i--j] = [i+l..j[, and (i-.j) = [i+l..j —1]. Our notation for arrays 
is similar to that for strings: for example, a[i..j] denotes an array indexed by the numbers i,i+l, ■ ■ ■ ,j. 

Throughout the paper, we intensively use different periodic properties of strings. A string u is called 
a border of a string w if it is both a prefix and a suffix of ic. A string is unbordered if it has only trivial 
borders: the empty string and the string itself. An integer p is a period of ic if 0 < p < |i(;| and w[i] = w[i+p\ 
for all i = 1,2,..., |u'|—p. It is well known that p > 0 is a period of ic iff ic has a border of the length 
|u;| — p. A string of the form xx^ where a: is a nonempty string, is called a square. Let w[i..j] = xx for 
some i, j and a nonempty string a;; the position i + |x| is called the center of the square w[i..j]. A string 
w is primitive ii w x^ for any string x and any integer A: > 1. A string u is a conjugate of a string w if 
V = ui[i..|u;|]ui[l..i—1] for some i. 

Lemma 1 (see m)- A string w is primitive iff w has an unbordered conjugate. 

fi(i)=l fi(i)=3 fi(i)=4 

abb^bba abba^abUaW ^oabUaabBa 

i =5 / =8 i =4 

Figure 1: Internal, right external, and left external local periods of the string abbaabba. 

Now we can introduce the main notion of this paper. The local period at a position i 
(or centered at a position i) of w is the minimal positive integer /i(i) such that the substring 
i(;[max{l, i—p,(i)}.. min{|i(;|, i+p,(i)—1}[ has the period /r(z) (see Figure]^. Informally, the local period 
at a given position is the size of the smallest square centered at this position. We say that the local period 
p.(z) is left external [respectively, right external] if z — p-(z) < 1 [respectively, z + p(z) — 1 > |z(;|]; the local 
period is external if it is either left external or right external. The local period is internal if it is not exter¬ 
nal. Obviously, the local period at any position of w is less than or equal to the minimal period of w. A 
position z of zc with the local period that is equal to the minimal period of w is called a critical point] the 
corresponding factorization zz;[l..z—1[ • zz;[z..|z/;|] is called a critical factorization. The following remarkable 
theorem holds. 

Theorem 1 (see [HIT^]). Letw be a string with the minimal period p > 1. Any sequence of p—1 consecutive 
positions of w contains a critical point. 

Theorem implies that any string with the minimal period p has a critical point among the positions 
1, 2,... ,p. Clearly, the local period corresponding to any such critical point is left external. The following 
lemmas are straightforward. 

Lemma 2. If the local period at a position of a given string is both left external and right external, then 
this position is a critical point. 

Lemma 3. If the local period p.{i) at a position i of a given string w is not right external [respectively, left 
external], then the string w[i..i+p,(i) — l] [respectively, w]i—pL(i)..i—\]] is unbordered. 

3. 0(n log n) Algorithm 

Our construction is based on the following observation. 

Lemma 4. Let w be a string with the minimal period p > \. Denote k = max{T w[\..l] = 
w[j..j+l—f\ for some j G (l..p]}. The leftmost critical point of w is the leftmost position i > k + 1 with 
external local period. 
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Proof. Denote by j a position such that j S (l--p] and w[l..k] = w[j..j+k—l]. Obviously, each of the 
positions 1,2,..., k+1 has the local period that is at most j—1 < p (see Figure and hence cannot be a 
critical point. 
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Figure 2: The local period at a position i S [l..fe+l]. 

Consider a position i with left external local period < p. By Lemma p.{i) is not right external. 
So, we have 1] = w[iJ,{i)+l..i+p,{i) — l]. Since p,{i) + 1 < p, by the definition of k, we have i — 1 < k. 

Hence, any position i > k + 1 with left external local period is a critical point. 

Now consider a position i with right external local period p(t) < p. By Lemmaj^ /i(i) is not left external. 
It is easy to see that for any i' G (i..|u>|], we have and — > 1. Since Theorem [^implies that 

w must have a critical point with left external local period, the position i cannot be the leftmost position 
in (fc+l..|w|] with external local period. □ 

Hereafter, w denotes the input string of length n with the minimal period p. We process the trivial case 
p = 1 separately, so, assume p > 1. According to Theoremj^and Lemma|^ our algorithm processes only the 
first p positions of w from left to right starting from the position k + 2, where k is defined as in Lemma 
and when a local period at a given position i is computed, then the following positions are skipped while 
they have at most the same local period. This leads to an O(nlogn) time algorithm. To get a linear time 
algorithm, some local periods are reported from previous positions due to some local properties that are 
discussed in details in Section]^ More precisely, our 0(nlogn) algorithm is as follows. 

Algorithm 1 

1: compute k = max{l: = w[j..j+l—l] for some j G (l.-p]} 

2: i i — /b T 2; 

3 : while true do 

4: compute 

5: if /i(j) is external then 

6: i is the leftmost critical point; stop the algorithm; 

7 : fJ, G- 

8: while 1] = w[i+p—l] do > skip positions that have local period at most p 

9 : i i — z F 1; 


Obviously, the positions that the algorithm skips in lines [8]^ have the local period at most p < p and 
therefore cannot be critical points. So, Lemma immediately implies the correctness of Algorithm 1. 

To calculate the number k in 0{n) time, we utilize the following fact. 

Lemma 5 (see [5] Chapter 1.5]). For any strings u and w, one can compute in 0(|u|) time an array &[l..|u|] 
such that b[j] = max{T u[j..j+l—l] = r(;[l../]} for j G [l..|it|]. 

To complete our construction, we describe an algorithm calculating the local period p{i) at a given 
position i provided p{i) is internal. If this algorithm fails to compute p(z), we decide that the local period 
is external. 

Lemma 6. One can compute the internal local period p(i) at a given position i in 0{p{i)) time and space. 

Proof. Fix an integer x < i. Let us first describe an algorithm that finds pii) in 0{x) time and space 
provided p{i) < x. Using Lemmaj^ our algorithm constructs in 0(x) time an array b[i—x..i—l] (for clarity, 
the indices start with i—x) of the length x such that b[j] = max{L I < x and w[j..j+l—l] = w[i..i+l—l]} 
for j G [i—x..i). It is straightforward that p(z) = i — j for the rightmost j G [i—x..i) such that b[j] > i — j. 
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Now, to compute we consecutively execute the above algorithm for x = 2 °, 2 ^, 2 ^,..., 2 L^°sb i)J 
and, finally, for x = i—1 until we find Thus, the algorithm runs in 2-^) = 0{n{i)) time 

and space. □ 

4. 0{nlogn) Time Bound 

During the execution. Algorithm 1 calculates local periods at some positions. Let S be the sequence of 
all such positions in the input string w in increasing order. It is easy to see that the running time of the 
whole algorithm is 0{n + X]i6sM(0)- Thus, to prove that Algorithm 1 works in O(nlogn) time, it sufhces 

to show that /^(*) = O(nlogn). Simplifying the discussion, we exclude from S all positions i such that 

fj,{i) = 1. 

Fix an arbitrary number q. Denote by T(q) the maximal sum among all contiguous subse¬ 
quences S' of S such that fi{i) < q for each i S S'. We are to show that T{q) = 0{q log q), which immediately 

implies ~ O(nlogn) since the number q is arbitrary and T(n) = ^(*)- 

For further investigation, we need three additional combinatorial lemmas. Consider a position i of w 
with internal local period /i(j) > 1. Informally, Lemmaj^shows that at the positions any internal 

local period that “intersects” the position i and is not equal to fi{i) is either “very short” (< 5 /^( 1 )) or 
“very long” (> 2/r(i)). Lemmaclaims that always there is a “long” local period centered at 
moreover, this local period either is equal to or is “very long” (> 2/r(i)). Lemma [^connects the bounds 
on the internal local periods that “intersect” the position i, as in Lemma and those local periods that do 
not “intersect” the position i. Now let us formulate these facts precisely. 

Lemma 7. Let i be a position of w with internal local period qi{i) > 1. For any j € (i..i+p,{i)) such that 
j — fi{j) < i and ii{j) ^ /r(i), we have either p,{j) < ^p,{i) or > 2fi{i). 

Proof. The proof is essentially the same as in m Lemma 2]. Let > \pi{i). Suppose /r(j) = \p{i). 
Since, by Lemmathe string w[i..i+pL{i) — V\ is unbordered and hence cannot have the period /r(j) < p{i), 
we obtain j + /i(j) < i + p(i). The string w[j—p{j)..j-\-pL{j)—l] is not primitive and has the length p(i). 
Thus, the string w[i..i+p,(i)—l] is a conjugate of w{j—pL{j)..j-\-iJL{j) — l] and therefore is not primitive, a 
contradiction. 
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Figure 3: Two impossible cases in Lemma[^ (a) p(i)/2 < nii) < P{i), (b) p{i) < p(j) < 2/^(i). 


Now suppose ^(i)/2 < /r(j) < /r(i). As above, we have j -I- /r(j) < i+pL{i). Thus, the string w[j..j-|-/r(j)—l] 
has an occurrence w[j—pL{i)..j—pL(i)+pL{j) — \] that overlaps the string = w[j..j+p,{j)—l] 

because 2^(j) > /r(j) (see Figure a). But, by Lemma is unbordered and therefore 

cannot overlap its own copy. This is a contradiction. 

Finally, suppose /r(j) > p,(i). By Lemma w[j—iJL{j)..j—l] is unbordered. If j — qi{j) > i — pii), 
then w[j—pi{j)..j—V\ has the period p,(i) < pL{j), a contradiction. Hence, we have j — p,{j) < i — pli). 
If pL[j) < 2p,(i), then the string w[j..i+pL{i) — \], which is a suffix of w[i..i+pL(i)—l], has an occurrence 
w[j—pL{j)..i+p,{i)—p,{j) — l] that overlaps w[i—pi{i)..i—l] = w[i..i-\-pL{i) — V\ (see Figure [^b). This is a con¬ 
tradiction because, by Lemma w[i—p.{i)..i—1] is unbordered. □ 
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Lemma 8 . Let i be a position of w with internal loeal period p,(i) > 1. Then there exists j € (i..i+p,(i)) 
such that either p{j) = p(i) or p,{j) > 2fi{i). 

Proof. By Lemmaj^ the string w\i..i+pL(i)—l\ is unbordered and its minimal period is p{i). For any position 
j € {i..i+p(i)), denote by pt'{j) the local period in j with respect to the substring w[i..i+p{i) — l]. Observe 
that < /r(j)- By Theorem[^ there is j S {i..i+fj,{i)) such that /r'(j) = fj,{i) and j — p.'{j) < i. Hence, 
we have /r(j) > fj,{i) and, moreover, if /i(j) > fi{i), then, by Lemma 0 /i(j) > M^)■ □ 

Lemma 9. Let i he a position of w with internal local period /i(i) > 1. Fix j S (i..i+pL{i)). Then, for any 
h S such that fj.{h) > 1, we have p,{h) < max{^(h'): h' S [i-j] and h' — p,{h') < i}. 

Proof. Suppose, to the contrary, there is h G such that fi{h) > 1 and p{h) > max{^(/i'): h' G 

{i..j] and h' — pih') < i}; let h be the leftmost such position. Then, we have h — p.{h) > i. Using a 
symmetrical version of Lemmaj^ we obtain h' G {h—fj.{h)..h) such that fj,{h') > fj,{h). Since n{h') > p,{h), 
by the definition of h, we have h' — p.{h') > i. This contradicts to the choice of h as the leftmost position 
with the given properties because h' < h and h' G {i..j]. □ 

Hereafter, S' = denotes a contiguous subsequence of S such that p.{ij) < q for each 

j G [l..z] and T(q) = We associate with each ij the numbers rj = max{r: w[ij—pL(ij)..r—l] has 

the period ix{ij)} and Cj = max{c < rj—p,(ij): w[c..c-\-pi{ij)—l] is unbordered} (see Figure]^. By Lemma[^ 
the string w[ij..ij+pL(ij) — l] is unbordered and therefore Cj > ij. Since w[cj..Cj+p,(ij) — l] is unbordered 
and w[cj—pi(ij)..Cj — l] = w[cj..Cj+p,{ij)—l], we have pL{cj) = p.{ij). Since w[rj—p,{ij)..rj — \] is primitive, it 
follows from Lemmathat Cj > r^ — 2p,{ij). Algorithm 1 skips the positions ij + l,ij + 2,..., rj — fj,{ij) in 
the loop in lines |8[j^ 
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Figure 4: The positions , rj—fi{ij) are shaded. 


Lemma 10. For any j G and i G {cj..Cj+pt{cj)), we have p(i) ^ /i(cj). 

Proof. For converse, suppose pL{i) = p.{cj). Since w[i—pL{i)..i—l] = w[i..i+pt(i)—l] and pL{i) = p,(cj) = 
by the dehnition of rj, we have i <rj — ix{ij). It follows from Lemmathat w[i..i+p{i) — l] is unbordered. 

This contradicts to the definition of Cj because Cj < i < rj — p.{ij). □ 

To estimate the sum J2j=i "''^6 construct a subsequence isi,is 2 , ■ ■ ■ Ps, by the following inductive 

process. Choose Zs^ = U- Suppose we have already constructed a subsequence Zsi,Zs 2 j • • ■ Usj- Choose the 
minimal number i' G (cg^..c^^+/i(csj)) such that p{i') > p,{csj). By Lemmaj^ such number always exists. 
If i' > iz, we set t = j and stop the process. Let i' < i^. It follows from Lemma [l0| that p{i') pL^Cg.). 
Hence, by Lemma® p{i') > 2/i(cs^) = 2p{is.). Since p[i') > /r(*Sj), it follows from the definition of rg- that 
i' > rg. — p{isj)- Therefore, Algorithm 1 does not skip i' and i' G S. Since {zi,Z 2 , ■ • • ,U} is a contiguous 
subsequence of S, we have i! = iji for some / G [l..z\. Set ig^^^ = ij'. 

Now we can prove that the running time of Algorithm 1 is O(nlogn). For any j G [I-.t), we have 

and therefore J2*j=i h-i'i'Sj) < ft(Ut) + H-< 2^(zgJ < 2q. Further, 

let h G and ig. < ih < Zg for some j G [l..t). Since Algorithm 1 skips the positions (zs...Cg.] and 
*sj+i G {csj--Cgj+p{cg^)), it follows that in G (cg^..Cg^-|-/i(cgj)). Recall that Zg^+i is the minimal number from 
{cg...Cs.+p{cg.)) such that Az(zgj^J > /z(cgj. Thus, by Lemmasmand® we have p,{ih) < ^P-icg.) = \pL{ig.). 
In the same way, for h G [l-.z] such that ih > ig,, we have p(Jh) <^p,{igf). So, we obtain the following 
recursion: 

+ • • • + T 


T{q) < 2g + r + T 
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Consider a recursion T{q) = 0{q) + i® known that if the sum of the terms from the 

parentheses of T{...) in the right hand side of this recursion (i.e., 9j) i® 1®®® than or equal to q and each 

of those terms (i.e., each qj) is less than or equal to ^q, then the recursion has a solution T{q) = 0{q\ogq). 
Thus, since the sum of the terms from the parentheses of T(...) in the right hand side of Q is equal to 
) < q and each of these terms is less than or equal to we obtain T{q) = 0{qlogq). 


5. Problems with Linearity 

To obtain T{q) = 0{q), we might prove that if and /i(ist) are close enough (namely, > 

the term in Q is actually T(^^{is^_i)) < r(^/r(isj); this fact would imply that the 

sum of the terms in the parentheses of r(...) in the right hand side of Q is less than aq for some constant 
a < 1 and therefore T{q) = 0{q). Unfortunately, this is not true for Algorithm 1. Nevertheless, we prove 
a restricted version of the mentioned claim. It reveals problems that may arise in the current solution and 
points out a way to improvements. 

Lemma 11. Let i G (csj..Csj+/r(CsJ). Suppose p{i') < p{Cs-t) and p{i') ^ p.(*st_i) /or each i' G If 

IfJ'i.'ist-i) > then fi{i) < 

Proof. Recall that 2pL{cs^_^) < /i(*sj- Denote a = w[csj_i..Csj_j+/r(csj_i)—1] and b = 

w[cst_i+Ai(csj_J..Csj_i-/x(csj_J+/r(isJ-l] (see Figure]^. Note that /r(csj_J = |o| and ^(cgj = |aa 6 |. 
It follows from Lemma that a is unbordered. Since, by Lemma the string ..ist+Ai(isj)—1] is 
unbordered, the string b is not empty. The inequality ||a| = |/i(ist_i) > P-iist) = \baa\ implies \b\ < ||a|. 








U////4////A/////4//ZZZ1 




H(i^^) = \aah\ 


Figure 5: The strings a and h. 


In view of Lemma it suffices to prove the lemma only for the positions i such that i — So, 

assume i — p{i) < Cg^. Since p{i) < ^(cgj, it follows from Lemma ^that p{i) < ^p{cs^) = ^|6aa| < \ab\. 
Since, by Lemma|^ w[cs^..Cs^+p{cb^) — 1] is unbordered and thus cannot have the period p{i) < p(csf), we 
obtain i + /r(i) < Cg^ + p{cgf). So, w[i—p(i)..i+p{i) — l] is a substring of the string w[is^—p{igf)..rg^ — V\. 
Therefore, since w[ig^—p(igf)..rg^ — l] has the period p{igt) = picgf) = |aa6|, the string w[i—p{i)..i+p(i)—l] 
is a substring of the string u = aabaabaab (see Figure]^. Thus, to finish the proof, it suffices to prove the 
following claim. 

Claim. Let i be a position of u with internal local period p(i) (the local period at i is with respect to the 
string u). If p{i) < \ah\ and p{i) ^ |a|, then p{i) < ||a|. 

Let i be a position of u with internal local period p(i) such that p{i) < |a 6 | and p{i) ^ |a|. Consider two 
cases. 

1) Suppose i lies in an occurrence of a in it = aabaabaab. Without loss of generality, consider the case 
i G (|aa 6 a|..|aa&aa|]; all other cases are similar. If i — fj,(i) < |aa 6 a|, then, by Lemmawe have either 
p(i) < ^\a\ or p{i) > 2\a\. The latter is impossible because p{i) < |a 6 | < 2|a| while the former implies 
p{i) < ||a| as required. Now let i — p{i) > |aa 6 a|. Assume, by a contradiction, that p{i) > ||a|. Then 
w[i—p{i)..i—l\ is a substring of a and thus it has an occurrence v = i(;[i—/i(i)+|a 6 |..i—l+|a 6 |] (see Figure]^. 
Since 2/i(i) > ||a| > \ab\, the string w[i..i+p{i)—l\, which is also an occurrence of w[i—p{i)..i—l], overlaps 
V. This is a contradiction because w[i—p{i)..i—l] is unbordered by Lemma[^ 

2) Suppose i lies in an occurrence of 6 in it = aabaabaab. Without loss of generality, consider the case i G 
(|aa|..|aa 6 |]. Assume, by a contradiction, that p{i) > ||a|. Suppose i — p{i) > |a| (see Figure]^). Then the 
string it;[i—/i(i)..|aa|], which is a suffix of a, has an occurrence v = it;[i..|aa|+/i(i)]. Since p{i) > ||a| > | 6 |, v 
overlaps it;[|aa 6 |+l..|aa 6 a|] = a. Hence, a has a nontrivial border, clearly a contradiction. Suppose i — p{i) < 
|a| (see Figure IT)- Then the string i(;[|a|+l..|aa|] = a has an occurrence v = iii[|a|+l+/i(i)..|ao|+/i(i)]. 


6 





















\////////4/////////A 






\ab\ < 2f^(i) 


Ml 


Figure 6: The impossible case i 


G (|aa6a|..|aa6aa|] and i — fi{i) > |aaba| from the proof of Lemma |11[ 


Since fi{i) < |a 6 | and /i(i) + |a| > ||a| > \ab\, the string w[|aa 6 |+l..|aa 6 a|] = a overlaps v = a. This is a 
contradiction because a is unbordered. □ 


_ \m _ M _ 


m>i \ 


_ fi (i) _h_,_ M 


M< \ ab \ 


Figure 7: The impossible cases for i S (|aa|..|aa6|] in the proof of Lemma |ll| (a) i — /r(f) > |a|; (b) i — /r(i) < |a|. 


Let US consider how one might use Lemma [TT| to obtain T{q) = 0{q). Suppose t > 1, 
and iu.{ih) ^ M(*st_i) for all h G Lemma 11 implies that < §/J-(ist-i) ^ fo^ each 

h e (st-.z]. So, combining Lemmas [TH one caii3educe the following recursion: 

Let us estimate the sum of the terms from the parentheses of T(...) in the right hand side of (i). Since 

< 9, we have H-h < 1? + 5 ? = I*?- The sum ^ 5=1 J is 

bounded by 2q. It is well known that such recursion has a solution T{q) <2q+ ^2q + {^)^2q H-= 0{q). 

Unfortunately, a fatal problem arises when there is h G [st-.z] such that n{ih) = Exploiting this 

case, we construct a string on which Algorithm 1 performs il(nlogn) operations. 

Example. Let ai and hi be sequences of strings inductively defined as follows: qq = a, bo = b and 
Ui+i = OiSiOi, bi_^.l = biai$iaibi, where a, 6, $o, $i, $2, •. . are distinct letters. Denote Wi = aibitti. Note 
that Wi+i = aSiwSiwSitti; this recursive structure of Wi+i is very important for us. Our counterexample 
is the string w = where # is a unique special letter. Clearly, the minimal period of w 

is |w| —1. Since w = #ai+i 6 i+iai+i#ai+i#, it is easy to see that the number k = max{Z: w[l..Z] = 
w[j..j+l—l] for some j G (l..|w|)} is equal to |#ai+i|. So, Algorithm 1 starts with the position |#ai+i|+2. 
Now consider some combinatorial properties of Wi. 

Lemma 12. The string Wi = aibiQi satisfies the following conditions: 

(1) the local period at each of the positions [|ai|+ 2 ..|ai 6 i|] is internal; 

(2) the local period at position |ai&i|+l is right external. 

Proof. The proof is by induction on i. The base case wo = aba is obvious. The inductive step is 
Wi+i = ai+ifoi+iOi+i = OiSitti • biaSiaibi ■ aSiai = aSiWi^ituSiGi. Consider condition (1). The positions 
[|ai+i|+ 2 ..|ai+i 6 i|] correspond to the positions [|ai|+ 2 ..|ai&i|] of the first occurrence of the string Wi = aibiOi 
in Wi-i-i. Hence, by the inductive hypothesis, the local periods at these positions are internal. It is obvious 
that p = laSiGibil is a period of Wi+i and therefore the positions (p..|w|—p+I] all have internal local periods. 
So, it suffices to consider the positions [|rf;|—p+2..|ai+i6i+i|] = [|ai+i6iai$iai|+2..|ai+i6i+i|]. Similarly, these 
positions correspond to the positions [|ai|+ 2 ..|ai 6 i|] of the second occurrence of the substring Wi = afiiiGi in 
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w. Therefore, by the inductive hypothesis, all these positions have internal local periods. Consider condi¬ 
tion (2). Denote j = |ai+i&i+i-|-l|. By the inductive hypothesis, fi{j) > \ai\. Now since = $i, it is 

easy to see that ^(j) > |ai-|-i|, i.e., ^(j) is right external. □ 


The main loop of Algorithm 1 starts with the position |^ai+i|-|-2 = |ai$iai|-|-2, i.e., with the position 
|ai|-|-2 inside the first occurrence of Wi in rui+i = ai$iWi$iWi$iai. By Lemma |12[ we process Wi until the 


position \aibi\+l in Wi that corresponds to the position j = \=ff^ai%iaibi\ + l in w is reached. By Lemma 12 
we have /i(j) > \ai\. Hence, it is straightforward that /r(j) = {aSiGibil, which is a period of the whole string 
Wi+i- Algorithm 1 calculates and then skips some positions in the loop in lines |8||^ until it reaches 
the position j' = \^ai$iWi$iai\+2, all in 0(|i(;i+i|) time. The position j' corresponds to the position |ai|-|-2 
inside the second occurrence of Wi in Wi+i = So, we have some kind of recursion here. 

Denote by the time required to process the substring of w; it follows from our discussion that ti+i 
can be expressed by the following recursive formula: = 0(|?«i+i|) -I- 2ti (with to — 0). For simplicity, 

assume that the constant under the 0 is 1, so, ti+i = |wi+i| -I- 2ti. 

To estimate U+i, we first solve the following recursions: |ai+i| = 2|ai| + 1, |6i+i| = 2|6i| -I- 2|ai| + 1, 
Ircil = 2|ai| -I- \bi\ (with \ao\ = |&o| = !)• Obviously |ai| = 2*+^ — 1. Then = 2*+^ — 1 -f 2\bi\. By a 

simple substitution, one can show that \bi \ = i2*+^ -|- 1. So, we obtain = i2*+^ -|- 2®+^ — 1 and therefore 
ti = i2®+^ -I- 2®+^ -1-1- 2ti-i. By a substitution, one can prove that U = i^2® -|- 5*2® — 2® -|- 1: indeed, 
substituting U-i = (* — 1)^2®“^ -|- 5(* — 1)2®“^ — 2®“^ -|- 1, we obtain 


ti = i2®+i-h2®+2-l-t2<,_i 

= i2®+^ -t - 1 -f ( (*- 1)^2® 5(*-H^® - 2® -t 2) 

= i^2® -2i2:^+2f^ + 4S^ + 5z2® 2®+^ - 5 ■ 2®( 1 


Finally, since Iwi+i] = {i + 1)2®+^ -|-2®+^ — 1 = 0(*2®) and log |wi+i| = 0(*), we obtain ti+i = {i + 1)^2®+^ -|- 
5(* -I- 1)2®+^ - 2®+i -I- 1 = 0(*22®) = 0(|wi+i| log |wi+i|) = 0{\w\ log |w|). 


6. Linear Algorithm 


To overcome the issues addressed in the previous section, we introduce two auxiliary arrays m[l..n] and 
r[l..n] that are initially filled with zeros; their meaning is clarified by Lemma 13 below. In Algorithm 2 
below we use the three-operand for loop like in the C language. 


Lemma 13. If m[i\ 0 for some position i during the execution of Algorithm 2, then m[i] = fi(i) and 
r[i\ = max{r: iy[*..r—1] has the period /*(*)}. 
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denote rj = max{r: w[j..r—V\ has the period /*(j)}. It suffices to show that 
always assign p{j-\-m[i]) to TO[j-|-m[*]] and rj+rn[i] to r{j+m[i]]. Suppose 
for some j. Evidently, the string *<;[*—m[*]..r[*] — l] has the period m[i] (see 


15 


Proof. For each position j 
the assignments in lines 
Algorithm 2 performs line 

Figure]^. Further, by the condition in line 13 the strings u][j—m\j\..r[j\\ and w[j—m\j\+m[i\..r[j]+m[i\\ 
are substrings of w[i—m[i]..r[i] — l] and therefore they are equal. Hence, we have /i(j) = /i(j-|-m[*]) and 
rj + m[i] = provided p{j) = rn[j] and rj = r[j]. Now one can prove the desired claim by a simple 

induction. □ 


m[i]=n(i} n(i) n(i) fi(i) 



:— LL 

f ' i 

f ■ Y.. . . Y - 1 — 
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J 

W]\ 

hm[i_ 

WhmfiJ 
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fl(l) 

i fi(‘) 


Figure 8: j — m[j] > i — m[i] and r[j] + m[i] < r[i]. 


































Algorithm 2 

1: compute k = max{h for some j S (l--p]} 

2: i i — /c -t- 2^ 

3: while true do 

4: if m i] = 0 then > m[i] is not computed 

5: compute n{i); 

6: if is external then 

7: i is the leftmost critical point; stop the algorithm; 

8: m[i] n{i); 

9 : r[i] ^ i + m[i]; 

10: while = w[r[i]] do 

11: r[i\ ^ r[i\ + 1; 

12: for {j ^ i — m[i]; j < r[i]—m[i]] j j + 1) do 

13: if rn{j] ^ 0 and j — m[j] > i — m\i\ and r[j] + m[i] < r[i\ then 

14: •(— m[j]] 

15: r[j+m[i]] ^ r[j] + m[i]; 

16: i •<— r[i] — m[i] + 1; 


By Lemma the assignment in line skips exactly the same set of positions as the loop in lines [7}|^ in 
Algorithm 1. Thus, Lemma [l^ implies that the values m[i\ = /i(i) computed by Algorithm 2 coincide with 
the same values computed by Algorithm 1 and hence are correct. However, now we do not compute some 
local periods but copy them from the array m instead. It turns out that this is crucial for the time analysis. 

As above, let S be the sequence of all positions that Algorithm 2 does not skip in line Again, we 
exclude from S all positions i such that /i(j) = 1. Evidently, the resulting sequence is exactly the same as 
the sequence S in Section]^ but, in contrast to Algorithm 1, the new algorithm copies local periods at some 
positions of S from the array m rather than calculates them explicitly. Denote by S the subsequence of all 
positions of S for which Algorithm 2 computes local periods explicitly in line 

Due to the assignment in line |16[ obviously, the loop in lines |10| - |ll| performs at most n iterations in total. 
The loop in lines 12 -15 performs exactly the same number of iterations as the loop in lines T^^plus /i(i) 
iterations for an appropriate i G S. Hence, the running time of the whole algorithm is 0{n + 

Thus, to prove that Algorithm 2 is linear, it suffices to show that J2ies “ 0{n). 

Fix an arbitrary number q. Denote by T(g) the maximal sum J2i^s'ns among all contiguous subse¬ 
quences S' of S such that fi{i) < q for each i G S' (note that we sum only through the positions of S). We 
are to show that T{q) = 0{q), which immediately implies = 0(n.) since the number q is arbitrary 

and T{n) = E^gsM0■ 

We need one additional combinatorial fact. 


Lemma 14. Let i be a position of w with internal local period fi{i) > 1. Suppose j is a position 
from {i..i+p,{i)) such that nij') < p{i) for each j' S {i--j]; then is a substring of 

w[i—pL{i)..i+p,{i) — l]. 

Proof. Assume, by a contradiction, that j + /r(j) > i + p.(i). For each h S [i..i+pL{i)), denote by pi'(h) 
the local period at the position h with respect to the substring w[i..i+pi{i)—l]. Clearly pi'{h) < pL{h). By 
Lemma w[i..i-\-pL{i) — l] is unbordered and hence its minimal period is pi{i). By Theorem]^ there is 
h e [i..i+pL(i)) such that pi'{h) = pi{i). But for each h G [i-.j], we have pi'^h) < pi{i) and moreover, for 
each h G {j..i+p,{i)), pi'{K) < pL{j) < pL{i) because the local period /i'(j) is right external with respect to 
w[i..i+pi{i,)—l], a contradiction. □ 

Choose a contiguous subsequence S' = ■ • ■ ,iz\ of S such that pL{ij) < q for each j G [1..^] and 

Sigs'ns = T{q). As above, we associate with each ij the values cj and rj defined in Sectionj^ By an 
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inductive process described in Sectionwe construct a subsequence {is^Yj^i of S'. The following result 
complements Lemma [m 

Lemma 15. Let h G {st..z\ and fi{ih) = //then for each h' G {h..z\, we have 

ih' i s. 


Proof. We are to show that, informally, Algorithm 2 processes the position ih in the same manner as it 
processed ist_x and the loop in lines 12■ 15 copies all required local periods fi{ih') for h' G {h..z\ to the array 
in immediately after the computation of (Thus iw ^ S for h' G (h..z].) 


Denote a = w[c^^ 
ure^. Note that fj,{c 


[cst_i..Cs^_i+/r(cs,_J-l] and b = w[cs,_i+/r(cs,_i)..Cs*_i-/r(c„,_J+Ai(isJ-l] (see Fig- 


-i) = = |a| and /r(csj = /i(isj = |aa6|. 


Since ||a| 


= > hiist) = 


|aaol7 we have |6| < ^|a|. By Lemma 3 the string a is unbordered. Denote x = —|aa6|..Csj + |aa6| —1] 

(see Figure]^. Clearly, a; is a substring of the infinite string aab ■ aab ■ aab ■ ■ ■ and the length of x is at least 


2|aa&| (recall that can coincide with 
large. 


). Notice that the distance between ig^ and Cg^ can be arbitrarily 
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Figure 9: The internal structure of the string x from the proof of Lemma|l5| 


Without loss of generality, assume that ih is equal to the leftmost position i > Cg^ such that ^{i) = 
h{ist-i) = |a|- (Since {ii,... ,iz} is a contiguous subsequence of S, i is certainly equal to ih for some 
h G (st.. 2 :].) Obviously ih G (cg^..Cg^ + \aab\). It follows from the definition of ih and from Lemma 11 that 
for each i G {cg^.-ih), we have /i(i) < ||a|. So, Lemma 9 implies that ih — fJ^iih) = ih — |a| < Cg^. Since 
by Lemmathe string w[cst..Cs( + |aa5| —1] is unbordered and thus cannot have the period |a| < |aa6|, we 
obtain r/, < Cg^ + |aa&|. Thus, the string w[ih—\a\..rh] is a substring of x (see Figure 10). Now we must 
specify where the position ih can occur in x. 






K///J h 


_ ' ' -v-i ' ' ‘V-r -’M_ n ji _ 


C.. 
A S 


Hds )=\aab\ 




Figure 10: A location of ih, Ch, and Vh inside x from the proof of Lemma|l5| 


By Lemma 10 for any i G (csj_,..Csj_j + |a|), we have ^{i) |a|. Hence ih ft. (cst_i..Csj_j + |a|). More¬ 


over, since a; is a substring of the infinite string aab ■ aab ■ aab - ■ ■ and |a|..f/j-|-|a| —1] is a substring 

of X, in the same way one can prove that ih does not lie in the segments (csj_i-l-|a6a|..Csj_i-l-|a6aa|), 
(cst_i + |a&aa6a|..Csj_i-l-|a6aa6aa|),... (see Figure 10), i.e., informally, ih cannot lie in the right half of an 
occurrence of aa in x. 

Suppose ih G [Cst_i-l-|a|..Csj_i + |a6|). Then, the string w[ih — \a\..Cg^_j^ + \a\], which is a suffix of o, 
has an occurrence v = w[ih..Cg^_^ + \aa\] (see Figure [7i with i = ih). Since ia{ih) = |a| > |5|, 
V overlaps w[csj_i-l-|a6|..Cst_i-l-|a6a| —1] = a. Thus, a nas a nontrivial border, a contradiction. By 
the same argument, one can show that ih does not lies in the segments [cst_i-l-|a5aa|..Cst_i-l-|a&aa6|), 
[cst_i + |al'aa&aa|..Cst_i+|a6aa6aa6|),...; in other words, ih cannot lie in an occurrence of b in x. 

We have proved that ih lies in the left half of an occurrence of aa in x, precisely, in one of 


the segments [csj_j+|a6|..Csj_j-|-|a6a|], [cs^_j^ + \abaab\..Cg^_j^ + \abaaba\],.... Figure 10 illustrates the case 


ih € [cst_i + \ab\..Cg^_^ + \abaW\ all other cases are similar. First, we show that c/, is equal to Cs^_^ + |a6a|, i.e. 
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Ch is the center of an occurrence of aa in x (see Figure [l0|). Obviously, the string |a|..Cst_i + |a6aa| —1] 

has the period |a| and therefore Cst_i + |a6aa| < ru- The strings r(;[Csj_^ + |o6|..r;i—l] and ■u;[cst_i —|a|..rst_i —1] 
are similar: they both have the period |a|, and w[rh] ^ ui[r;j—|a|] and ^ —|o|]. Note that 

the starting positions of these strings differ by |aa6|. Furthermore, since + |aa6|, the strings 

w^[cst_i+|a6|..r/i] and —|a|..rsj_J both are substrings of x and hence they are equal because x 

has the period |aa6|. Now since w[cst_-^+\ab\..rh] is a suffix of w[ih—\a\..rh], it is straightforward that 

Ch = Csj_i + |a&a|. 

To finish the proof, it suffices to show that Algorithm 2 does not compute explicitly the local periods at 
the positions ih+i,ih+ 2 ^ ■ ■ ■ but obtains those local periods from the array m. For this purpose, let us hrst 
prove that for each h' G (h-.z], the string — ^ is a substring of w[ch—\a\..Ch+\a\ — V\. 

This fact implies that, in a sense, after the processing of the position Ch Algorithm 2 is in a situation that 
locally resembles the situation in which the algorithm was after the processing of the position Csj_i (see 
Figure [TT|), i.e.. Algorithm 2 examines exactly the same positions , iz shifted by i5 = — Cs^_^ 

or, more formally, = ih+i - S, ist.i+2 = ih+2 -5,..., is^_^+z-h =iz- 5. 



Figure 11: Local similarities between Cs^_^ and in the proof of Lemma|15| for brevity, denote g = st—i. Here z = h 3. 


Let i be the leftmost position from {ch--Ch+\a\) such that > fJ,{ch)- Lemmasj^and 10 imply that such 
position always exists and > 2/r(c/i) = |aa|. Since i G (csfCsi-+f^(CsJ) and |aa| > ||aa6| = b^(csj, it 
follows from Lemmas and 10 that /^(i) > 2^(csJ. Hence, by the definition of the subsequence we 

have i > Thus, for eachTr G (h..z\, we have n{ih') < fJ-ich) and ih> G (ch-d). Therefore, by Lemma 
the string w[ih'—fJ,{ih')-dh'+^J.{ihi) — l] is a substring of —|a|..Ch+|a| —!]• 

Suppose isf G S. Summing up the established facts, we obtain that since S = Ch — is a multiple of 
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= |aa6|, the loop in lines 1^15 performed immediately after the computation of the local period at 
the position in line[^ copies m[ih+i—S],m[ih+2—S], ■ ■ ■ ,m[iz—S], which are certainly filled with nonzero 
values, to ra[ih+i]T'cn[ih+ 2 \, ■ ■ ■; w[iz], respectively. Thus, Algorithm 2 does not compute explicitly the local 
periods at the positions ih+i,ih+2, ■ ■ ■ ,iz- 

Suppose isj ^ S', i.e., m[zsj and r[zsj are nonzero at the time the algorithm reaches ig^. It follows 
from Algorithm 2 that the values m[ig^] and r[isj are obtained from values m[z'] and r[i'] for some position 
i' < ig^ such that w[i '= w[is^—m\ig^]..r[ig^\\. Suppose i' G S. Thus, when Algorithm 2 
had calculated it passed through the positions Zsj+i—Zst+ 2 —<5,... ,iz—b, where S = ig^ — stored 

the corresponding local periods in i5], ■ ■ • j 5], and then copied those values to 

m[zsj+i],m[zsj+ 2 ], ■ • ■,m[zz], respectively, when copied m[i'] to to[zsJ. Finally, suppose i' ^ S. By an 
obvious induction, one can prove that in this case m[ig^+i—S],m[ig^+ 2 —S], ■ ■ ■ ^mliz—S] are also filled with 
correct values and thus the same argument shows that m[is^+i],rri[ig^+ 2 ], ■ ■ ■ ,'cn\iz\ are eventually set to 
nonzero values. □ 

Suppose t > 1 and |/r(zst_i) ^ As in Section|^ T{q) is determined by the recursion Q. Let 

us estimate the sum of the terms from the parentheses of T{...) in the right hand side of Q. Since 
< f/z(zsj, we have ln{igj + --- + ^n{ig,) < +^ ^ + ■■■) + ^niig,) < ]^q. 

Suppose t > 1, |/i(zst_i) > fJ'i'ist)- Let h be the minimal number from {st-.z] such that /i(z/i) = /i(zst-i) 
(if it does not exist, assume that h = z). By the definition of the subsequence {isj}j-i, we have ih G 
(cst ..Csj+/r(csj)). Lemma [Tl| implies that /i(z) < |M(*st_i) < for each i G {cg^.Ah). Further, by 

Lemma 15 we have ih’ ^ S for each h' G {h..z] and thus we can ignore these positions in our analysis. So, 
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combining Lemmas one can deduce the following recursion: 

T{q) < +/i(4) +'r +---+T + T Q^C***)^ 

Let us estimate the sum of the terms from the parentheses of T{...) in the right hand side of ([^. 
Since < <?. we have 5 m(*si) + ••• + ^niist-i) + ^K'i’st) < 5? + I? = I?- Clearly, the sum 

is bounded by 3g. 

Finally, in the case t = 1 we have, by Lemmas and El Tiq) < +T{^fi{isJ). Obviously, 
the term from the parentheses of T{.. .), is less than or equal to ^q. 

Putting everything together, it is easy to see that T{q) is determined by the recursion T{q) < 3g + 
'^{qj) some terms {qj}^^i such that qj ^ <^q^ where a = min{^, |, i} < 1. It is well known 
that such recursion has the solution T{q) < Sq + a3q + a'^3q + • • • = = 0{q). Thus, the above analysis 

of Algorithm 2 proves the following theorem. 

Theorem 2. There is a linear time and space algorithm finding the leftmost critical point of a given string 
on an arbitrary unordered alphabet. 

7. Conclusion 

We have shown that the problems of the computation of a critical factorization on unordered and ordered 
alphabets both have linear time solutions. This is in contrast with the seemingly related problem of finding 
repetitions in strings (squares, in particular) for which it is known that in the case of unordered alphabet 
one cannot even check in o(n log n) time whether the input string of length n contains some repetitions 
while in the case of ordered alphabet there are fast o(nlogn) time checking algorithms (see [51 fTUl fTTl [T^ L 
The search of similarities between those problems was actually our primary motivation for the present work 
although our result shows that the restriction to the case of unordered alphabets does not add considerable 
computational difficulties to the problem of the calculation of a critical factorization unlike the problem of 
finding repetitions, so, they are not similar in this aspect. 

As a byproduct, we have obtained the first generalization of the constant space string matching algorithm 
of Crochemore and Perrin |3] to unordered alphabets. However, this generalization requires nonconstant 
space in the preprocessing step. So, it is still an open question to find a linear time and constant space 
algorithm computing a critical factorization (not necessarily the leftmost one) of a given string on an 
arbitrary unordered alphabet. Using such tool, one can possibly obtain a constant space string matching 
algorithm that is simpler and faster than the well-known algorithm of Galil and Seiferas [7]. 
Acknowledgement. The author would like to thank Arseny M. Shur for helpful discussions and the 
invaluable help in the preparation of this paper. 
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